Let's continue with our NYC 311 service requests example.

3.1 Selecting only noise complaints

I'd like to know which borough has the most noise complaints. First, we'll take a look at the data to see what it looks like:

To get the noise complaints, we need to find the rows where the "Complaint Type" column is "Noise - Street/Sidewalk". I'll show you how to do that, and then explain what's going on.

If you look at noise_complaints, you'll see that this worked, and it only contains complaints with the right complaint type. But how does this work? Let's deconstruct it into two pieces

This is a big array of Trues and Falses, one for each row in our dataframe. When we index our dataframe with this array, we get just the rows where our boolean array evaluated to True. It's important to note that for row filtering by a boolean array the length of our dataframe's index must be the same length as the boolean array used for filtering.

You can also combine more than one condition with the & operator like this:

Or if we just wanted a few columns:

3.2 A digression about numpy arrays

On the inside, the type of a column is pd.Series

and pandas Series are internally numpy arrays. If you add .values to the end of any Series, you'll get its internal numpy array

So this binary-array-selection business is actually something that works with any numpy array:

3.3 So, which borough has the most noise complaints?

It's Manhattan! But what if we wanted to divide by the total number of complaints, to make it make a bit more sense? That would be easy too:

Oops, why was that zero? That's no good. This is because of integer division in Python 2. Let's fix it, by converting complaint_counts into an array of floats.

So Manhattan really does complain more about noise than the other boroughs! Neat.